Improving Pronunciation Accuracy of Proper Names with Language Origin Classes
نویسندگان
چکیده
Pronunciation of proper names that have different and varied language sources is an extremely hard task, even for humans. This thesis presents an attempt to improve automatic pronunciation of proper names by modeling the way humans do it, and tries to eliminate synthesis errors that humans would never make. It does so by taking into account the different language and language family sources and by adding such information as features into the pronunciation models, either directly or indirectly. This approach does result in an improvement of pronunciation accuracy, however in order to assess the true goodness of this approach, we would need to develop a more accurate language identifier. Ultimately, the data we would like to have in order to train our models is a list of proper names tagged both with their phonetic transcription and with the language they come from. A new approach this thesis begins to investigate is the unsupervised clustering of proper names to derive language classes in a data-driven way. With this approach, no language classes (Catalan, English, French, German, etc.) need to be determined a priori, but rather they are inferred from the names and their pronunciation. The clustering method used takes into account letter trigrams as well as their aligned pronunciation at training time. Experiments using the classes derived from unsupervised clustering are still preliminary and have not yet yielded an improvement in pronunciation accuracy of proper names.
منابع مشابه
Knowledge of language origin improves pronunciation accuracy of proper names
As it is impossible to have a lexicon with complete coverage, and a high proportion of unknown words are proper names, this paper addresses the issue of automatically finding pronunciations of unseen proper names in US English. Proper names, especially in the US, may come from a large range of ethnic backgrounds. We present a model and results showing that including ethnic origin of words in a ...
متن کاملG2P Conversion of Proper Names Using Word Origin Information
Motivated by the fact that the pronunciation of a name may be influenced by its language of origin, we present methods to improve pronunciation prediction of proper names using word origin information. We train grapheme-to-phoneme (G2P) models on language-specific data sets and interpolate the outputs. We perform experiments on US surnames, a data set where word origin variation occurs naturall...
متن کاملBasis Identification for Automatic Creation of Pronunciation Lexicon for Proper Names
Development of a proper names pronunciation lexicon is usually a manual effort which can not be avoided. Grapheme to phoneme (G2P) conversion modules, in literature, are usually rule based and work best for non-proper names in a particular language. Proper names are foreign to a G2P module. We follow an optimization approach to enable automatic construction of proper names pronunciation lexicon...
متن کاملGenerating proper name pro for automatic speech
Generating correct pronunciation of proper names remains one of the most difficult tasks in text-to-phoneme transcription. Although phonetic rules can be efficient in processing proper names of one language, foreign family names cannot be always correctly generated without additional pronunciation rules. The present study addresses the problem of pronunciation variants for French and foreign fa...
متن کاملProper Name Machine Translation from Japanese to Japanese Sign Language
This paper describes machine translation of proper names from Japanese to Japanese Sign Language (JSL). “Proper name transliteration” is a kind of machine translation of proper names between spoken languages and involves character-tocharacter conversion based on pronunciation. However, transliteration methods cannot be applied to Japanese-JSL machine translation because proper names in JSL are ...
متن کامل